Introduction:Artificial intelligence (AI) and machine learning (ML) are increasingly used in oncology to enable personalized treatment. In multiple myeloma (MM), novel drugs and combination regimens have improved patient outcomes, but treatment response and duration vary widely, due to clinical and molecular heterogeneity. To address this variability, we developed a multimodal machine learning framework that integrates gene expression and clinical data to predict individualized treatment duration, enabling patient stratification for therapy selection and helping identify regimens with the longest expected benefit or optimal choices when durations are similar, supporting precision oncology in MM.

Methods:A total of 652 MM patients with paired clinical and genomic profiles from the MM Research Foundation (MMRF) CoMMpass IA22 dataset were included after quality control, which removed patients without treatment duration data, retained only MM-relevant clinical variables, and kept only the first treatment course and its duration for patients with multiple courses. We assembled a comprehensive feature set comprising 150 clinical variables (120 numerical and 30 encoded categorical) and 196,661 gene expression features derived from bone marrow aspirates. Treatment information included 17 therapeutic agents and 45 treatment regimens, covering doublet, triplet, and quadruplet combinations. All numerical features were normalized to zero mean and unit variance. To manage high dimensionality and multimodal data heterogeneity, we developed a structured machine learning analytical pipeline using gradient-boosted decision tree (GBDT) feature selection to identify clinical and molecular variables with the highest predictive value.

We evaluated three machine learning approaches for estimating treatment duration: (1) Multimodal Neural Network (MMNN), a deep learning model that integrates clinical and gene expression data through separate neural pathways for prediction; (2) Extreme Gradient Boosting (XGBoost), a gradient boosting ensemble that builds sequential decision trees while maintaining interpretability through feature importance analysis; and (3) Feature Tokenizer Transformer (FTTransformer), a transformer-based architecture that uses self-attention mechanisms to capture complex patterns in biomedical data. These models predict the days a patient remains on their initial therapy until discontinuation or progression, using 5-fold cross-validation to ensure robustness. Model performance was assessed using accuracy, recall, F1-score, and Area Under the Curve (AUC).

Results:The GBDT algorithm identified 146 top features, capturing 95% of cumulative importance. These included 5 clinical variables—percentage of plasma cells in bone marrow, serum Ig lambda light chains, glucose level, R-ISS stage, and sex. The remaining 141 features were gene expression markers, including SLAMF1, a surface antigen expressed on MM cells and a therapeutic target; CDK4 and MYB, key regulators of cell cycle and transcription implicated in MM pathogenesis; NUAK1 (ARK5), implicated in plasma cell survival; and FOXP3, whose overexpression in the bone marrow environment indicates MM suggests an accumulation of CD4 regulatory T cells that contribute to the immunosuppressive niche.

We used these features to train and evaluate MMNN, XGBoost, and FTTransformer models. XGBoost achieved the best performance, with accuracy 91.2% (95% CI: 0.904–0.921), AUC 0.874 (95% CI: 0.863–0.884), F1-score 0.910 (95% CI: 0.902–0.918), and recall 0.912 (95% CI: 0.904–0.921). MMNN and FTTransformer yielded accuracies of 85.6% and 81.2%, and AUCs of 0.810 and 0.738, respectively. These results demonstrate the effectiveness of XGBoost, particularly when paired with high-dimensional feature selection.

Conclusion:This study presents a multimodal machine learning-based framework that accurately predicts individualized treatment duration in MM patients by integrating clinical and gene expression data. The XGBoost model outperformed deep learning alternatives and demonstrated high predictive performance, suggesting immediate applicability in clinical decision-making. The proposed framework demonstrates the feasibility of integrating AI into MM treatment planning, offering physicians a data-driven tool to estimate therapy duration. Future work will focus on external validation and deployment in clinical workflows to support real-time, patient-specific decision-making.

This content is only available as a PDF.
Sign in via your Institution